Autoregressive model

In statistics and signal processing, an autoregressive (AR) model is a type of random process which is often used to model and predict various types of natural phenomena. The autoregressive model is one of a group of linear prediction formulas that attempt to predict an output of a system based on the previous outputs.

1 Definition
2 Example: An AR(1)-process
3 Calculation of the AR parameters
- 3.1 Yule-Walker equations
- 3.2 Derivation
4 Spectrum
5 Characteristic polynomial
6 Implementations in statistics packages
7 See also
8 Notes
9 References
10 External links

Definition

The notation AR(p) indicates an autoregressive model of order p. The AR(p) model is defined as

$X_t = c %2B \sum_{i=1}^p \varphi_i X_{t-i}%2B \varepsilon_t \,$

where $\varphi_1, \ldots, \varphi_p$ are the parameters of the model, $c$ is a constant (often omitted for simplicity) and $\varepsilon_t$ is white noise.

An autoregressive model can thus be viewed as the output of an all-pole infinite impulse response filter whose input is white noise.

Some constraints are necessary on the values of the parameters of this model in order that the model remains wide-sense stationary. For example, processes in the AR(1) model with |φ₁| ≥ 1 are not stationary. More generally, for an AR(p) model to be wide-sense stationary, the roots of the polynomial $\textstyle z^p - \sum_{i=1}^p \varphi_i z^{p-i}$ must lie within the unit circle, i.e., each root $z_i$ must satisfy $|z_i|<1$ .

Example: An AR(1)-process

An AR(1)-process is given by:

$X_t = c %2B \varphi X_{t-1}%2B\varepsilon_t\,$

where $\varepsilon_t$ is a white noise process with zero mean and variance $\sigma_\varepsilon^2$ . (Note: The subscript on $\varphi_1$ has been dropped.) The process is wide-sense stationary if $|\varphi|<1$ since it is obtained as the output of a stable filter whose input is white noise. (If $\varphi=1$ then $X_t$ has infinite variance, and is therefore not wide sense stationary.) Consequently, assuming $|\varphi|<1$ , the mean $\mbox{E}(X_t)$ is identical for all values of t. If the mean is denoted by $\mu$ , it follows from

$\operatorname{E} (X_t)=\operatorname{E} (c)%2B\varphi\operatorname{E} (X_{t-1})%2B\operatorname{E}(\varepsilon_t),$

that

$\mu=c%2B\varphi\mu%2B0,$

and hence

$\mu=\frac{c}{1-\varphi}.$

In particular, if $c = 0$ , then the mean is 0.

The variance is

$\textrm{var}(X_t)=\operatorname{E}(X_t^2)-\mu^2=\frac{\sigma_\varepsilon^2}{1-\varphi^2},$

where $\sigma_\varepsilon$ is the standard deviation of $\varepsilon_t$ . This can be shown by noting that

$\textrm{var}(X_t) = \varphi^2\textrm{var}(X_{t-1}) %2B \sigma^2,$

and then by noticing that the quantity above is a stable fixed point of this relation.

The autocovariance is given by

$B_n=\operatorname{E}(X_{t%2Bn}X_t)-\mu^2=\frac{\sigma_\varepsilon^2}{1-\varphi^2}\,\,\varphi^{|n|}.$

It can be seen that the autocovariance function decays with a decay time (also called time constant) of $\tau=-1/\ln(\varphi)$ [to see this, write $B_n=K\phi^{|n|}$ where $K$ is independent of $n$ . Then note that $\phi^{|n|}=e^{|n|\ln\phi}$ and match this to the exponential decay law $e^{-n/\tau}$ ].

The spectral density function is the Fourier transform of the autocovariance function. In discrete terms this will be the discrete-time Fourier transform:

$\Phi(\omega)= \frac{1}{\sqrt{2\pi}}\,\sum_{n=-\infty}^\infty B_n e^{-i\omega n} =\frac{1}{\sqrt{2\pi}}\,\left(\frac{\sigma_\varepsilon^2}{1%2B\varphi^2-2\varphi\cos(\omega)}\right).$

This expression is periodic due to the discrete nature of the $X_j$ , which is manifested as the cosine term in the denominator. If we assume that the sampling time ( $\Delta t=1$ ) is much smaller than the decay time ( $\tau$ ), then we can use a continuum approximation to $B_n$ :

$B(t)\approx \frac{\sigma_\varepsilon^2}{1-\varphi^2}\,\,\varphi^{|t|}$

which yields a Lorentzian profile for the spectral density:

$\Phi(\omega)= \frac{1}{\sqrt{2\pi}}\,\frac{\sigma_\varepsilon^2}{1-\varphi^2}\,\frac{\gamma}{\pi(\gamma^2%2B\omega^2)}$

where $\gamma=1/\tau$ is the angular frequency associated with the decay time $\tau$ .

An alternative expression for $X_t$ can be derived by first substituting $c%2B\varphi X_{t-2}%2B\varepsilon_{t-1}$ for $X_{t-1}$ in the defining equation. Continuing this process N times yields

$X_t=c\sum_{k=0}^{N-1}\varphi^k%2B\varphi^NX_{t-N}%2B\sum_{k=0}^{N-1}\varphi^k\varepsilon_{t-k}.$

For N approaching infinity, $\varphi^N$ will approach zero and:

$X_t=\frac{c}{1-\varphi}%2B\sum_{k=0}^\infty\varphi^k\varepsilon_{t-k}.$

It is seen that $X_t$ is white noise convolved with the $\varphi^k$ kernel plus the constant mean. If the white noise $\varepsilon_t$ is a Gaussian process then $X_t$ is also a Gaussian process. In other cases, the central limit theorem indicates that $X_t$ will be approximately normally distributed when $\varphi$ is close to one.

Calculation of the AR parameters

There are many ways to estimate the coefficients: the OLS procedure, method of moments (through Yule Walker equations),MCMC.

The AR(p) model is given by the equation

$X_t = \sum_{i=1}^p \varphi_i X_{t-i}%2B \varepsilon_t.\,$

It is based on parameters $\varphi_i$ where i = 1, ..., p. There is a direct correspondence between these parameters and the covariance function of the process, and this correspondence can be inverted to determine the parameters from the autocorrelation function (which is itself obtained from the covariances). This is done using the Yule-Walker equations.

Yule-Walker equations

The Yule-Walker equations are the following set of equations.

$\gamma_m = \sum_{k=1}^p \varphi_k \gamma_{m-k} %2B \sigma_\varepsilon^2\delta_{m,0},$

where m = 0, ... , p, yielding p + 1 equations. $\gamma_m$ is the autocorrelation function of X, $\sigma_\varepsilon$ is the standard deviation of the input noise process, and $\delta_{m,0}$ is the Kronecker delta function.

Because the last part of the equation is non-zero only if m = 0, the equation is usually solved by representing it as a matrix for m > 0, thus getting equation

$\begin{bmatrix} \gamma_1 \\ \gamma_2 \\ \gamma_3 \\ \vdots \\ \end{bmatrix} = \begin{bmatrix} \gamma_0 & \gamma_{-1} & \gamma_{-2} & \dots \\ \gamma_1 & \gamma_0 & \gamma_{-1} & \dots \\ \gamma_2 & \gamma_{1} & \gamma_{0} & \dots \\ \vdots & \vdots & \vdots & \ddots \\ \end{bmatrix} \begin{bmatrix} \varphi_{1} \\ \varphi_{2} \\ \varphi_{3} \\ \vdots \\ \end{bmatrix}$

solving all $\varphi$ . For m = 0 we have

$\gamma_0 = \sum_{k=1}^p \varphi_k \gamma_{-k} %2B \sigma_\varepsilon^2$

which allows us to solve $\sigma_\varepsilon^2$ .

The above equations (the Yule-Walker equations) provide one route to estimating the parameters of an AR(p) model, by replacing the theoretical covariances with estimated values. One way of specifying the estimated covariances is equivalent to a calculation using least squares regression of values X_t on the p previous values of the same series.

Another usage is calculating the first p+1 elements $\rho(\tau)$ of the auto-correlation function. The full auto-correlation function can then be derived by recursively calculating ^[1]

$\rho(\tau) = \sum_{k=1}^p \alpha_k \rho(k-\tau)$

Examples for some Low-order AR(p) processes

p=1
- $\gamma_1 = \varphi_1 \gamma_0$
- Hence $\rho_1 = \gamma_1 / \gamma_0 = \varphi_1$
p=2
- The Yule-Walker equations for an AR(2) process are
  
  $\gamma_1 = \varphi_1 \gamma_0 %2B \varphi_2 \gamma_{-1}$
  
  $\gamma_2 = \varphi_1 \gamma_1 %2B \varphi_2 \gamma_0$
  - Remember that $\gamma_{-k} = \gamma_k$
  - Using the first equation yields $\rho_1 = \gamma_1 / \gamma_0 = \frac{\varphi_1}{1-\varphi_2}$
  - Using the recursion formula yields $\rho_2 = \gamma_2 / \gamma_0 = \frac{\varphi_1^2 - \varphi_2^2 %2B \varphi_2}{1-\varphi_2}$

Derivation

The equation defining the AR process is

$X_t = \sum_{i=1}^p \varphi_i\,X_{t-i}%2B \varepsilon_t.\,$

Multiplying both sides by X_t − m and taking expected value yields

$\operatorname{E}[X_t X_{t-m}] = \operatorname{E}\left[\sum_{i=1}^p \varphi_i\,X_{t-i} X_{t-m}\right]%2B \operatorname{E}[\varepsilon_t X_{t-m}].$

Now, $\operatorname{E}[X_t X_{t-m}] = \gamma_m$ by definition of the autocorrelation function. The values of the noise function are independent of each other, and X_t − m is independent of ε_t where m is greater than zero. For m > 0, E[ε_tX_t − m] = 0. For m = 0,

$\operatorname{E}[\varepsilon_t X_{t}] = \operatorname{E}\left[\varepsilon_t \left(\sum_{i=1}^p \varphi_i\,X_{t-i}%2B \varepsilon_t\right)\right] = \sum_{i=1}^p \varphi_i\, \operatorname{E}[\varepsilon_t\,X_{t-i}] %2B \operatorname{E}[\varepsilon_t^2] = 0 %2B \sigma_\varepsilon^2,$

Now we have, for m ≥ 0,

$\gamma_m = \operatorname{E}\left[\sum_{i=1}^p \varphi_i\,X_{t-i} X_{t-m}\right] %2B \sigma_\varepsilon^2 \delta_{m,0}.$

Furthermore,

$\operatorname{E}\left[\sum_{i=1}^p \varphi_i\,X_{t-i} X_{t-m}\right] = \sum_{i=1}^p \varphi_i\,\operatorname{E}[X_{t} X_{t-m%2Bi}] = \sum_{i=1}^p \varphi_i\,\gamma_{m-i},$

which yields the Yule-Walker equations:

$\gamma_m = \sum_{i=1}^p \varphi_i \gamma_{m-i} %2B \sigma_\varepsilon^2 \delta_{m,0}.$

for m ≥ 0. For m < 0,

$\gamma_m = \gamma_{-m} = \sum_{i=1}^p \varphi_i \gamma_{|m|-i} %2B \sigma_\varepsilon^2 \delta_{m,0}.$

Spectrum

The power spectral density of an AR(p) process with noise variance $Var(Z_t) = \sigma_Z^2$ is^[1]

$S(f) = \frac{\sigma_Z^2}{| 1-\sum_{k=1}^p \varphi_k e^{-2 \pi i k f} |^2}.$

AR(0)

For white noise (AR(0))

$S(f) = \sigma_Z^2.$

AR(1)

For AR(1)

$S(f) = \frac{\sigma_Z^2}{| 1- \varphi_1 e^{-2 \pi i k f} |^2} = \frac{\sigma_Z^2}{ 1 %2B \varphi_1^2 - 2 \varphi_1 cos{2 \pi f} }$

If $\varphi_1 > 0$ there is a single spectral peak at f=0, often referred to as red noise. As $\varphi_1$ becomes nearer 1, there is stronger power at low frequencies, i.e. larger time lags.
If $\varphi_1 < 0$ there is a minimum at f=0, often referred to as blue noise

AR(2)

AR(2) processes can be split into three groups depending on the characteristics of their roots:

$p_1,p_2 = \frac{1}{2}\left(\varphi_1 \pm \sqrt{\varphi_1^2 %2B 4\varphi_2}\right)$

When $\varphi_1^2 %2B 4\varphi_2 < 0$ , the process has a pair of complex-conjugate roots, creating a mid-frequency peak at:

$f^* = \frac{1}{2\pi}cos^{-1}\left(\frac{(1-\varphi_2)^2(4\varphi_2 %2B \varphi_1^2)}{4\varphi_2}\right)$

Otherwise the process has real roots, and:

When $\varphi_1 > 0$ it acts as a low-pass filter on the white noise with a spectral peak at $f=0$
When $\varphi_1 < 0$ it acts as a high-pass filter on the white noise with a spectral peak at $f=1/2$ .

The process is stable when the roots are within the unit circle, or equivalently when the coefficients are in the triangle $-1 \le \varphi_2 \le 1 - |\varphi_1|$ .

The full PSD function can be expressed in real form as:

$S(f) = \frac{\sigma_Z^2}{1 %2B \varphi_1^2 %2B \varphi_2^2 - 2\varphi_1(1-\varphi_2)\cos(2\pi f) - 2\varphi_2\cos(4\pi f)}$

Characteristic polynomial

Auto-correlation function of an AR(p) process can be expressed as

$\rho(\tau) = \sum_{k=1}^p a_k y_k^{-|\tau|} ,$

where $y_k$ are the roots of the polynomial

$\phi(B) = 1- \sum_{k=1}^p \varphi_k B^k .$

Auto-correlation function of an AR(p) process is a sum of decaying exponential.

Each real root contributes a component to the auto-correlation function that decays exponentially.
Similarly, each pair of complex conjugate roots contributes an exponentially damped oscillation.

Implementations in statistics packages

R, the stats package includes an ar function.^[2]

Notes

^ ^a ^b Von Storch, H.; F. W Zwiers (2001). Statistical analysis in climate research. Cambridge Univ Pr. ISBN 0521012309.
^ "Fit Autoregressive Models to Time Series"

References

Mills, Terence C. (1990) Time Series Techniques for Economists. Cambridge University Press
Percival, Donald B. and Andrew T. Walden. (1993) Spectral Analysis for Physical Applications. Cambridge University Press
Pandit, Sudhakar M. and Wu, Shien-Ming. (1983) Time Series and System Analysis with Applications. John Wiley & Sons
Yule, G. Udny (1927) "On a Method of Investigating Periodicities in Disturbed Series, with Special Reference to Wolfer's Sunspot Numbers", Philosophical Transactions of the Royal Society of London, Ser. A, Vol. 226, 267–298.]
Walker, Gilbert (1931) "On Periodicity in Series of Related Terms", Proceedings of the Royal Society of London, Ser. A, Vol. 131, 518–532.

External links

AutoRegression Analysis (AR) by Paul Bourke